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Abstract — The researchers have drawn much attention 
about the genetic systems controlling blood groups in humans. 
These findings are considerably important in understanding 
human genetics. The transmission of genes from one generation 
to the next one takes place biologically through the gametes. 
The aim of this paper is to estimate the changes in blood group 
using long run state. This analysis is useful to blood types 
inheritance diseases. 

Index Terms — ABO system, allele frequency, genotype 
frequency, maximum likelihood estimation. 

I. INTRODUCTION 

Blood group testing plays a key role in medical 
treatment prior to blood transfusion and child birth. The 
blood group of a person does not change within one’s own 
life time and so it is considered as a genetic marker for 
research. The blood group is determined by the genetic 
make-up of the alleles of a system. The present study was 
under taken to estimate the existing frequencies of the alleles 
and the expected frequencies of genotypes. 

The most well known and medically important blood types 
are in the ABO group. All humans and many other primates 
can be typed for the ABO blood group. There are four 
principal types: A, B, AB, O. There are two antigens and two 
antibodies that are mostly responsible for the ABO types. The 
table below shows the possible permutation of antigens and 
antibodies with the corresponding ABO type. 

Tablel: Permutation of antigens and antibodies 


ABO 
blood type 

Antig 
en A 

Antig 
en B 

Antibo 
dy anti-A 

Antibo 
dy anti-B 

A 

Yes 

No 

No 

Yes 

B 

No 

Yes 

Yes 

No 

O 

No 

No 

Yes 

Yes 

AB 

Yes 

Yes 

No 

No 


Research carried out in Heidelburg, Germany by 
Ludwik Hirzfeld and Emil Von Dungem in 1910 and 1911 
showed that the ABO blood types are inherited. An 
individual’s ABO type results from the inheritance of 1 of 3 
alleles (A, B or O) from each parent. The possible 
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combination of alleles produces blood types as shown in the 
table below [3]. 


Table2: Combination of alleles 


Parent alleles 

A 

B 

0 

A 

AA 

AB 

AO 


(A) 

(AB) 

(A) 

B 

AB 

BB 

BO 


(AB) 

(B) 

(B) 

O 

AO 

BO 

OO 


(A) 

(B) 

(O) 


In the above table the offspring receives one of the three 
alleles from each parent, giving rise to six genotypes (AA, 
AO, AB, BB, BO, OO) and four possible phenotype 
(A,B,0,AB). Both A and B are dominant over O. As a result, 
individuals who have an AO genotype will have an A 
phenotype. People who have OO genotype will have an O 
phenotype. In other words, they inherited a recessive O allele 
from both parents. The A and B alleles are codominant. 
Therefore, if an A is inherited from one parent and a B from 
the other, the phenotype will be AB [1] and [7]. 

Population genetics study frequencies of genotypes 
and alleles within populations rather than the ratios of 
phenotypes. Estimates of gene frequencies provide more 
valuable information on the genetic similarity of different 
population and to some extent on their ancestral genetic 
linkage. In a constant environment, genes will continue to 
sort similarly for generations upon generations. The 
observation of this constancy led two researches, G.Hardy 
and W.Weinberg, to express an important relationship in 
evolution named as Hardy-Weinberg equilibrium which 
serves as the null model for population genetics. It applies 
basic rules of probability to a population to make predictions 
about the next generation. 

The allele frequency (and genotype frequency) of a 
population remains constant over generations, unless a 
specific factor or combination of factors disrupts this 
equilibrium. Such factors might include non-random mating, 
mutation, natural selection, genetic bottle necks leading to 
increased genetic drift, the immigration or emigration of 
individuals [2], [4] and [5]. 

II. Model description 

Collect a sample of individuals having A, B, AB, O 
phenotypes. Let n A denote the number of individuals having 
A phenotype, n B denotes the number of B, n AS denotes the 
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number of AB and n a denotes the number of individual’s 
with O phenotype so that n A -f n B -f- n AB +- n s = n 
If 5 F* and P 0 denote the allele frequencies such that 

P A 4- P 5 -f P 0 = l,then the expected probabilities of the 
phenotype are as follows. 


Table3: Expected probabilities 


Phenotype 

Genotype 

Probability 

A 

AA, AO 

Pi +2 P a P 0 

B 

BB, BO 

Pi + 2 P 3 Po 

O 

OO 

p2 

r a 

AB 

AB 

2 P a Pb 


The likelihood is given as 

up A .p„? s y - (fS + 2? A p„^(fi + 

2 p 5 p D y* (2P A p B T**ip*r° 


Since Pq is redundant we substitute F ff by 1 - P A — P 3 in 
the above expression and taking logarithm we obtain the 
likelihood as 


•'’»> = >"B W + 2 P, (1 - P A 

+ 2n 0 log(l -P A 

To find the maximizer of the loglikelihood function we 
differentiate with respect toF^ ,F 5 and then set the derivative 
to be zero. Hence we can estimate the maximum likelihood 
estimator by means of EM algorithm [6] . 

By choosing P A ,F 5 and F ff we could determine how many 
individuals with A phenotype have the AA genotype and AO 
genotype using 

Pi 


n AA — n A 


Pi T 2FjF, 


LA 


A 1 0 J 


2 p a p 0 


”ao = ”a 


Pi +2 p a p. 


ana 


lm 1 

Similarly for B phenotype 


hr- 


s 


2Pb?0 j 


ana n BO 


t 2F- c 

j + 2 


2F g Pry 


J 


and 


ofcourse n A z — ^ A5 ,n 00 = n 0 which is the expectation part 
of the EM algorithm. 

After estimating the genotype frequencies we could estimate 
the allele frequency by gene counting using 
P A = ,p B = pHHizwz^ ] and p n - 


a 


i- 


B DP ^ n B g + D- l 

2r. J 


which is the maximization part of the EM algorithm. It is 
called maximization because we calculate the maximum 
likelihood estimates of the allele frequency given the 
observed genotype count. Repeat the whole sequence several 
times to find the maximum likelihood estimates of the allele 
frequencies provided the method assumes the genotypes are 
found in Hardy- Weinberg proportion. Once the maximum 
likelihood estimator is derived, the general theory of the 
maximum likelihood estimation provides standard errors, 


statistical tests and other results useful for statistical 
inference. 


III. DATABASE 


The present study comprised of 1000 individuals. 
Data were collected for ABO blood group from the hospitals. 
The frequencies observed 

were?!* = 153, = 62 S,?i^ 5 = 121, = 93 

We know that n A + n B + n AB -f n G . = n = 379 and 
assume P A = 0.3333, 4 =0.3333 and P 0 = 0.3333. 


Now 


?i 


AA 




Pj\ Pg> J 


= 153 




2 2 33 3 


033Z3 2 - 2 ( C. 3 3 3 3 ) ('D . 3 3 3 3 j. 


52.0667 


n AO ~ 


105.3333 

”55 = 

% 

209.3333 

”50 = 

^5 

.6667 



- * 

A Ip^+iPaPoI 
1333 ^ 


= 153 


“ B 


■2P B Pfy. 


= 023 


[ 

[ 


2 (2.3333. 1 :; 2.3333 ) 
2.3333 s >2(2.333 3) (2.3333 ) 


2 3 33 3 


2.333 3 s -2(2,3333) (2,3333) 


0 ' 


n A5 - ^5 


= 02S 


[ 


2 (2,3333) (2,3333 ) 
2.3333 s +2(^3333) (2.3333) 


121 and n 00 = n 0 = 93 is 


—418 


the 


approximate estimation of gene frequency. 

Using these estimates approximate estimation of 

n 5 fSf(^W2F 3 (¥&. P A n ( 2 /^Wn 


below 

4=P 


2fi J 


_ mv. 


£££7)-i:i 3333 


Pb = 


o 


2n ss +n SD +7i JS 


[ 

= [■ 


l AB 


2n 

ZB pp 

2n 


—1 = 0.1658 

2(1222 ) i 

1 r2(70) + 140 E 35] 

■ = ' 1 = 0.4792 


H- 


2(379) 

>142 - 5 6 6567 

2(!3 79) 


]- 


0.3350 


The approximate solution obtained by 

Hardy-Weinberg method is 

P A = O.105S,F 5 = 0.4792 and P € = 0.3350 . On 

repeating this process seven more times using the estimated 
values, better estimation due to Hardy-Weinberg method are 
P A = 0 . 1543,4 ~ 0.5100 and P c = 0.3357 which the 
maximum loglikelihood estimates. The gene frequency 
attained by Hardy-Weinberg method are = 29.53 , 
”ao = 128.47 

n ss = 271. 10,%, = 356.90, = 121 ,n 00 = 93 which 

means 29. 5 3 of the A phenotypes in the sample have AA 

genotype and the remaining 12 3, .47 have AO genotype. 
Similarly we have estimated that 271.10 of B phenotypes in 
the population sample have BB genotype and the remaining 
350 .90 have BO genotype. By the time we have repeated this 
process seven more times, the allele frequencies attains 
stationary value. The above result also reveals that the 
population allele frequency is closer for B and O type and BO 
type gene frequency is also found to occur higher. 

To investigate whether the genotype frequencies are 
compatible with the basic principle, we test the assumptions 
for the genotype frequency using Chi-square test. We first 
formulate a conservative hypothesis, called the null 
hypothesis (H 0 ), which states that there is no difference 
between the observed and expected values; therefore the 
population is in Hardy Weinberg equilibrium. 
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The observed frequency and the expected frequency 
attained by using the maximum loglikelihood estimates are 
indicated in the table shown below. 


Table4: Expected value of the phenotype 


Phenot 

ype 

Observed 

Frequency 

Probability 

Probability 

value 

Expected 

Frequency 

A 

158 

Pa +2 P a Pc 

0.1274 

127.4 

B 

628 

Pi + 2P B P 0 

0.6027 

602.7 

O 

93 

Pi 

0.1127 

112.7 

AB 

121 

2P a P B 

0.1574 

157.4 


^ 2 for goodness of fit is given by jp 2 — =20.27 


The object is to establish the significant difference for one 
degrees of freedom at 5% level. The^" value reveals that 
there is significant difference in the gene frequencies which 
is due solely to chance. 


IV. CONCLUSION 

From the above result the gene and allele frequency 
has been attained. It reveals that the population allele 
frequency is closer for B and O type and BO type gene 
frequency is also found to occur higher. The probability value 
also suggests that the B type phenotype occurs highly among 
the individuals having BB and BO type genotype. It also 
revealed that the gene frequency is a function of allele 
frequency and the equilibrium was attained in the seventh 
generation, independent of the initial genotype frequency. 
Hence after seventh generation the transmission of genes is 
completely stopped and the changes in blood group occurs 
i.e., the offspring receive new alleles from their parents allele 
and not their ancestor allele. Also the inherited disease will 
not be generated through the blood during the next 
generation. This will be helpful in decision making during 
blood transfusion. 
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